Skip to content

Conversation

@jordimas
Copy link
Collaborator

@jordimas jordimas commented Nov 2, 2025

Changes:

  • Update Windows image to 2022 (2019 deprecated and no longer available)
  • Update from Python 3.9 (which reach end of live 2025-10-31) to Python 3.10
  • Update Windows and Linux from CUDA 12.2 to CUDA 12.4
  • Updated OneAPI to 2024.0 (previous older SDK was given 404)
  • Fixes to compile in MacOS

@jordimas jordimas changed the title WIP: Fix build and update to CUDA 12.4 Fix build and update to CUDA 12.4 Nov 2, 2025
@jordimas jordimas merged commit ac8b6e7 into OpenNMT:master Nov 3, 2025
14 checks passed
@jordimas jordimas deleted the fix_ci_cuda_12_4 branch November 4, 2025 20:39
@3manifold
Copy link

3manifold commented Nov 5, 2025

@jordimas PR #1905 was solving the same issue. I also explained all the error details there. Given that you're the new admin, it is suspicious the fact that you ignored that PR (the only green CI for months) and decided to create a duplicate PR. 😡 You did not event leave a comment or mention that PR 👎

@3manifold
Copy link

You even "fixed" the CMake warnings correct? 😠 There is also a ticket and a PR for that as well #1899 #1907

@ozancaglayan
Copy link
Contributor

How does this interplay with the fact that one loads CTranslate2 from a docker image where CUDA runtime installed is >= 12.4? Why do we compile against 12.4 still and also how does the CUDA_DYNAMIC_LOADING cmake flag interplay with the toolkit version that the package is compiled against?

@jordimas
Copy link
Collaborator Author

Hello. None of the changes that I did impacts the Docker container. It compiles and runs still with 12.2 and works fine. If you have observed any problem please share the details.

Here there is a new Dockerfile that supports Cuda 12.4:
#1932

I will appreciate if you can test it.

@3manifold
Copy link

Probably @ozancaglayan implies that python/tools/prepare_build_environment_linux.sh has to be synced with docker/Dockerfile since there's an overlap in their instructions. Ideally, some functions in python/tools/prepare_build_environment_linux.sh can be refactored and reused in docker/Dockerfile.

@ozancaglayan
Copy link
Contributor

ozancaglayan commented Nov 13, 2025

Sorry for being unclear:

  • I build my own docker container for CTranslate2 & Faster Whisper
  • I've beeing using CUDA-12.8 runtime dependencies there through apt-get without any issues
  • I'm not building CTranslate2 there from source just installing it through pip

But in this repo and related to this PR, I see that there's an explicit compilation stage which was using CUDA 12.2 and now is updated to CUDA 12.4.

Probably all minor releases of CUDA 12 are compatible and one can switch to a newer one at runtime, right?

If we were to try CUDA-13 for example, what would be the steps for that?

@Purfview
Copy link

What about this issue [int8 doesn't work on 50xx GPUs]: #1865

@jordimas
Copy link
Collaborator Author

Hello @ozancaglayan

I am not familiar with CUDA-18 runtime, to my knowledge the latest version is 13.
Since CTranslate2 is compiled with Cuda 12.4 you need a driver that is compatible with version 12.
I cannot tell about your specific combinations, but suggestion is that you tried and if does not work please open an issue: https://github.com/OpenNMT/CTranslate2/issues Thanks

@ozancaglayan
Copy link
Contributor

Sorry I meant CUDA 12.8 🤦

@BBC-Esq
Copy link

BBC-Esq commented Nov 13, 2025

When it comes to CUDA compatibility, review the following for full compatibility issues/conflicts. I'm also including compatibility regarding other well-known libraries like flash attention 2, triton, etc, since most programs typically use all of the above. any way to make ctranslate2 more compatible??

NOTE, I only do Windows since that's all I can test...but shouldn't be too hard to pull similar info for Linux users:

****************************
Torch and CUDA Compatibility
****************************
+-------+-------+--------+----------+
| Torch | Wheel | CUDA   | cuDNN    |
+-------+-------+--------+----------+
| 2.9.0 | cu130 | 13.0.0 | 9.13.0.50|
|       | cu128 | 12.8.1 | 9.10.2.21|
|       | cu126 | 12.6.3 | 9.10.2.21|
+-------+-------+--------+----------+
| 2.8.0 | cu129 | 12.9.1 | 9.10.2.21|
|       | cu128 | 12.8.1 | 9.10.2.21|
|       | cu126 | 12.6.3 | 9.10.2.21|
+-------+-------+--------+----------+
| 2.7.1 | cu128 | 12.8.0 | 9.5.1.17 |
|       | cu126 | 12.6.3 | 9.5.1.17 |
+-------+-------+--------+----------+
| 2.7.0 | cu128 | 12.8.0 | 9.5.1.17 |
|       | cu126 | 12.6.3 | 9.5.1.17 |
+-------+-------+--------+----------+
| 2.6.0 | cu126 | 12.6.3 | 9.5.1.17 |
|       | cu124 | 12.4.1 | 9.1.0.70 |
+-------+-------+--------+----------+
# Python: 2.6-2.8 (>=3.9, <=3.13), 2.9 (>=3.10, <=3.14)
# https://github.com/pytorch/pytorch/blob/main/.github/scripts/generate_binary_build_matrix.py
# https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix


*****************************************
Metapackage Versions Within CUDA Releases
*****************************************
+----------+------------+--------------+-----------+-----------+-----------+
| CUDA Ver | cuda-nvrtc | cuda-runtime | cuda-nvcc |  cublas   |   cufft   |
+----------+------------+--------------+-----------+-----------+-----------+
| 12.6.3   | 12.6.77    | 12.6.77      | 12.6.77   | 12.6.4.1  | 11.3.0.4  |
| 12.8.0   | 12.8.61    | 12.8.57      | 12.8.57   | 12.8.3.14 | 11.3.3.41 |
| 12.8.1   | 12.8.93    | 12.8.90      | 12.8.93   | 12.8.4.1  | 11.3.3.83 |
| 12.9.1   | 12.9.86    | 12.9.79      | 12.9.79   | 12.9.1.4  | 11.4.1.4  |
| 13.0.0   | 13.0.48    | 13.0.48      | 13.0.48   | 13.0.0.19 | 12.0.0.15 |
| 13.0.1   | 13.0.88    | 13.0.88      | 13.0.88   | 13.0.2.14 | 12.0.0.61 |
| 13.0.2   | 13.0.88    | 13.0.96      | 13.0.88   | 13.1.0.3  | 12.0.0.61 |
+----------+------------+--------------+-----------+-----------+-----------+
# https://docs.nvidia.com/cuda/archive/12.6.3/cuda-toolkit-release-notes/index.html
# https://developer.download.nvidia.com/compute/cuda/redist/


****************************************
Torch Compatibility with Python & Triton
****************************************
+-------+------------------------+--------+----------+
| Torch | CUDA Versions          | Triton | Sympy    |
+-------+------------------------+--------+----------+
| 2.9.0 | 12.6, 12.8, 12.9, 13.0 | 3.5.0  | >=1.13.3 |
| 2.8.0 | 12.6, 12.8, 12.9       | 3.4.0  | >=1.13.3 |
| 2.7.1 | 12.6, 12.8             | 3.3.1  | >=1.13.3 |
| 2.7.0 | 12.6, 12.8             | 3.3.0  | >=1.13.3 |
| 2.6.0 | 12.4, 12.6             | 3.2.0  | 1.13.1   |
+-------+------------------------+--------+----------+
* Triton 3.1.0 and later wheels: https://github.com/woct0rdho/triton-windows/releases (supports Python 3.12)
* Since triton-windows==3.2.0.post11, windows wheels are published to https://pypi.org/project/triton-windows/
# The METADATA file for each torch wheel shows its compatibility with Python, Triton, and Sympy


*************************
WINDOWS Flash Attention 2
*************************
+-----------------+--------+---------+--------+
| Flash Attention | Python | PyTorch | CUDA   |
+-----------------+--------+---------+--------+
| 2.8.2 & 2.8.3   | 3.10   | 2.6.0   | 12.4.1 |
|                 | 3.11   | 2.6.0   | 12.4.1 |
|                 | 3.12   | 2.6.0   | 12.4.1 |
|                 | 3.13   | 2.6.0   | 12.4.1 |
+-----------------+--------+---------+--------+
| 2.8.2 & 2.8.3   | 3.10   | 2.7.0   | 12.8.1 |
|                 | 3.11   | 2.7.0   | 12.8.1 |
|                 | 3.12   | 2.7.0   | 12.8.1 |
|                 | 3.13   | 2.7.0   | 12.8.1 |
+-----------------+--------+---------+--------+
| 2.8.2 & 2.8.3   | 3.10   | 2.8.0   | 12.8.1 |
|                 | 3.11   | 2.8.0   | 12.8.1 |
|                 | 3.12   | 2.8.0   | 12.8.1 |
|                 | 3.13   | 2.8.0   | 12.8.1 |
+-----------------+--------+---------+--------+
# https://github.com/kingbri1/flash-attention


********
Xformers
********
+------------------+-------+---------------+----------------+---------------+
| Xformers Version | Torch |      FA2      |       CUDA (excl. 11.x)        |
+------------------+-------+---------------+--------------------------------+
| v0.0.32.post2    | 2.8.0 | 2.7.1 - 2.8.2 | 12.8.1, 12.9.0                 |
| v0.0.32.post1    | 2.8.0 | 2.7.1 - 2.8.2 | 12.8.1, 12.9.0                 |
| v0.0.32          | 2.7.1 | 2.7.1 - 2.8.2 | 12.8.1, 12.9.0                 | * BUG
| v0.0.31.post1    | 2.7.1 | 2.7.1 - 2.8.0 | 12.8.1                         |
| v0.0.31          | 2.7.1 | 2.7.1 - 2.8.0 | 12.6.3, 12.8.1                 |
| v0.0.30          | 2.7.0 | 2.7.1 - 2.7.4 | 12.6.3, 12.8.1                 |
| v0.0.29.post3    | 2.6.0 | 2.7.1 - 2.7.2 | 12.1.0, 12.4.1, 12.6.3, 12.8.0 |
| v0.0.29.post2    | 2.6.0 | 2.7.1 - 2.7.2 | 12.1.0, 12.4.1, 12.6.3, 12.8.0 |
+------------------+-------+---------------+--------------------------------+
* Torch support: https://github.com/facebookresearch/xformers/blob/main/.github/workflows/wheels.yml
* FA2 support: https://github.com/facebookresearch/xformers/blob/main/xformers/ops/fmha/flash.py
* CUDA support: https://github.com/facebookresearch/xformers/blob/main/.github/actions/setup-build-cuda/action.yml
* ```

@BBC-Esq
Copy link

BBC-Esq commented Nov 13, 2025

@jordimas PR #1905 was solving the same issue. I also explained all the error details there. Given that you're the new admin, it is suspicious the fact that you ignored that PR (the only green CI for months) and decided to create a duplicate PR. 😡 You did not event leave a comment or mention that PR 👎

I give you credit man. I've notice that sometimes people contribute their free time to create a PR only to have someone create another PR that's 95% similar all on their own...and then no recognition.

At the same time, glad that the peeps at Ctranslate2 are finally letting someone actually update things for this great repository...and I'm sure @jordimas (who I've communicated with before, good guy) isn't getting paid to do this sort of this for the repo either so...lol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants